From Data to Dwellings:

Decoding Amsterdam’s Housing Prices

Ayushman Anupam
Chennai Mathematical Institute


Introduction


This project aims to analyze and predict housing prices in Amsterdam using comprehensive data collected in August 2021. The Amsterdam housing market has experienced significant fluctuations in recent years, driven by various factors such as economic conditions, demographics, and housing policies. Understanding the dynamics of this market is crucial for buyers, sellers, and investors alike. Therefore, the primary objective of this analysis is to identify and comprehend the trends that influence housing prices in Amsterdam. By examining a rich dataset that includes detailed information about house prices and their associated features, we seek to uncover variables that exhibit a strong correlation with housing prices.

In our exploration of the dataset, we will focus on identifying key predictors of housing prices. These predictors may include various attributes such as the area of the property, the number of rooms, the location’s longitude and latitude, and other relevant features. Through exploratory data analysis (EDA), we will visualize these relationships to determine which factors most significantly impact housing prices.

Furthermore, the project will assess whether the identified predictors can be effectively employed in a predictive model to estimate housing prices and forecast market trends. By leveraging statistical techniques and machine learning algorithms, we aim to develop a robust model that offers a good fit for predicting housing prices based on the available dataset. This model will not only facilitate a deeper understanding of how various factors interact and contribute to price fluctuations but also provide practical applications for stakeholders in the real estate market.



Data set Description


This dataset detailed information about house prices in Amsterdam, Netherlands as of August 2021.

The housing prices have been obtained from Pararius.nl as a snapshot in August 2021. The original data provided features such as price, floor area and the number of rooms. The data has been further enhanced by utilising the Mapbox API to obtain the coordinates of each listing.

The Amsterdam House Price Prediction data set contains 924 records and includes features such as Address, Zip, Rooms, Area, Lat, Lon and Price as defined below. However, there are 4 missing values in the “Price” field. To ensure data integrity, these records with missing prices are removed before further analysis or modeling. This step ensures accurate predictions by eliminating incomplete data points, which could skew the results of the machine learning models designed to predict house prices based on the remaining features.

The 7 for sale in and around Amsterdam as in the data set are:


Our Amsterdam data set table looks like

##   X              Address     Zip  Price Area Room      Lon      Lat
## 1 1 . .. ... ..... ..... 1091 CR 685000   64    3 4.907736 52.35616
## 2 2 . .. ... ..... ..... 1059 EL 475000   60    3 4.850476 52.34859
## 3 3 . .. ... ..... ..... 1097 SM 850000  109    4 4.944774 52.34378
## 4 4 . .. ... ..... ..... 1060 TH 580000  128    6 4.789928 52.34371
## 5 5 . .. ... ..... ..... 1036 KN 720000  138    5 4.902503 52.41054
## 6 6 . .. ... ..... ..... 1051 AM 450000   53    2 4.875024 52.38223



Exploratory Data Analysis


Figure 01. Histogram of (a) Price, (b) number of rooms and (c) Area of House


Figure 02. (a) Number of Rooms vs Price of House and (b) Area vs Price of House


Figure 03. (a) Longitude vs Price , (b) Latitude vs Price


Figure 04. Plots between Area vs Rooms


Figure 05. Plot of House Locations(longitude and Latitude), Price and Area


Figure 06. Plot of Rooms vs Area coloured by Price


Figure 07. Interactive 3D plot of Area vs Room vs Price


Figure 08. Interactive 3D plot of Latitude vs Longitude vs Price


Correlation between Variables



Figure 09. Heat map of correlation between fields of Amsterdam House data set


Correlation matrix for fields is

##                 Price       Area       Rooms   Longitude    Latitude
## Price      1.00000000 0.83509018  0.62344800 -0.01356113  0.06219568
## Area       0.83509018 1.00000000  0.80828526  0.02176190  0.01417911
## Rooms      0.62344800 0.80828526  1.00000000 -0.02575327 -0.02116819
## Longitude -0.01356113 0.02176190 -0.02575327  1.00000000 -0.18344478
## Latitude   0.06219568 0.01417911 -0.02116819 -0.18344478  1.00000000



Result


The exploratory data analysis (EDA) conducted on the Amsterdam housing dataset provides valuable insights into the factors influencing housing prices. The key findings from the EDA are summarized below:

  1. Univariate Analysis:
    • Price: Histogram reveals a right-skewed distributed, with mean price around € 580,000.
    • Area: Most properties range between 50 and 150 square meters.
    • Rooms: Houses with 3 or 4 rooms are most common.
  2. Bivariate Analysis:
    • Rooms vs. Price: Strong positive correlation i.e, more rooms means high house price.
    • Area vs. Price: positive correlation is observed i.e, larger homes are more expensive.
    • Price vs. Longitude/Latitude: Weak correlation, but location still influences price.
    • Rooms vs. Area: High correlation between room count and area.
  3. Multivariate Analysis:
    • Price, Area, and Location: Geographic location influences prices, though area remains a stronger predictor.
    • Rooms vs. Area by Price: Higher-priced houses typically have more rooms and larger areas.
    • Correlation Heatmap: Price is most strongly correlated with area and rooms, while longitude and latitude have weaker effects.



Conclusion


This project successfully analyzed and predicted housing prices in Amsterdam by examining a comprehensive dataset from August 2021. Through univariate, bivariate, and multivariate data analyses, we identified significant trends and correlations between various predictors such as area, number of rooms, and geographic location.